Novel exported fusion enzymes with chorismate mutase and cyclohexadienyl dehydratase activity: Shikimate pathway enzymes teamed up in no man's land

Chorismate mutase (CM) and cyclohexadienyl dehydratase (CDT) catalyze two subsequent reactions in the intracellular biosynthesis of l-phenylalanine (Phe). Here, we report the discovery of novel and extremely rare bifunctional fusion enzymes, consisting of fused CM and CDT domains, which are exported from the cytoplasm. Such enzymes were found in only nine bacterial species belonging to non-pathogenic γ- or β-Proteobacteria. In γ-proteobacterial fusion enzymes, the CM domain is N-terminal to the CDT domain, whereas the order is inverted in β-Proteobacteria. The CM domains share 15% to 20% sequence identity with the AroQγ class CM holotype of Mycobacterium tuberculosis (∗MtCM), and the CDT domains 40% to 60% identity with the exported monofunctional enzyme of Pseudomonas aeruginosa (PheC). In vitro kinetics revealed a Km <7 μM, much lower than for ∗MtCM, whereas kinetic parameters are similar for CDT domains and PheC. There is no feedback inhibition of CM or CDT by the pathway's end product Phe, and no catalytic benefit of the domain fusion compared with engineered single-domain constructs. The fusion enzymes of Aequoribacter fuscus, Janthinobacterium sp. HH01, and Duganella sacchari were crystallized and their structures refined to 1.6, 1.7, and 2.4 Å resolution, respectively. Neither the crystal structures nor the size-exclusion chromatography show evidence for substrate channeling or higher oligomeric structure that could account for the cooperation of CM and CDT active sites. The genetic neighborhood with genes encoding transporter and substrate binding proteins suggests that these exported bifunctional fusion enzymes may participate in signaling systems rather than in the biosynthesis of Phe.

Chorismate mutase (CM) catalyzes the rearrangement of chorismate to prephenate, which in turn is converted to phenylpyruvate by prephenate dehydratase (PDT), a cyclohexadienyl dehydratase (CDT) specific for prephenate (Fig. 1).Both CM and CDT are crucial enzymes for the biosynthesis of L-phenylalanine (Phe) in plants, fungi, and bacteria via the shikimate pathway (1,2).The activities of key enzymes in this pathway are tightly controlled because its products are energetically expensive, and the reactions are closely linked with other metabolic processes (1).Control over the metabolic flux in this pathway is typically exerted in bacteria by feedback inhibition of the first enzyme of the pathway, 3-deoxy-D-arabino-heptulosonate 7-phosphate (DAHP) synthase ( 3), and at the branch point of the pathway by regulating the activity of CM, often in conjunction with an associated PDT or a fused prephenate dehydrogenase (PDH) domain and of anthranilate synthase (4)(5)(6).
Two families of CMs, the AroH and AroQ class, are known, which exhibit evolutionarily and structurally totally unrelated folds.The predominant family found in nature, and also in the organisms studied in this work, is the AroQ class, which can be further divided into four subclasses AroQ α to AroQ δ .AroQ CMs have α-helical folds (1,(7)(8)(9)(10)(11), with the AroQ prototype, the AroQ α CM domain from Escherichia coli (EcCM), consisting of two intertwined protomers forming a 6-helix bundle with two active sites.To enable metabolic pathway regulation, AroQ CMs frequently occur as bifunctional fusion enzymes or are involved in protein complexes.CMs fused to a PDT domain usually belong to the AroQ α subclass and are found in the cytoplasm.For example, PheA in E. coli (also called Pprotein) consists of the AroQ α CM domain EcCM fused to a PDT domain followed by a third regulatory (also referred to as 'ACT') domain.Similar fusion proteins are found in most Proteobacteria (12,13).The CM and PDT activities of PheA are reduced by 55% and 85%, respectively, in the presence of excess Phe, the relevant end product of the biosynthetic pathway (14,15).Furthermore, E. coli also entails an AroQ α CM domain fused to a PDH, forming the bifunctional fusion enzyme TyrA (also called T-protein) that is also common in enteric bacteria (12,13).Unlike PheA, TyrA does not have a third regulatory ACT domain; instead, feedback regulation in TyrA occurs through L-tyrosine (Tyr) binding near the active site of PDH, leading to full inhibition of PDH activity, whereas CM retains 93% of its activity (5).
Alternative modes of metabolic feedback control are known for the intracellular CMs from the AroQ β and AroQ δ subclasses.The CM of Saccharomyces cerevisiae belongs to the AroQ β subclass and forms a dimer composed of 12 α-helices (8).Each subunit harbors a catalytic center, which is activated by Trp and inhibited by Tyr upon binding to different allosteric sites at the interface of the two protomers (16).AroQ δ CMs only occur in the cytoplasm of the taxonomic class Actinobacteria and include the well-studied enzymes of Mycobacterium tuberculosis and Corynebacterium glutamicum (17).The dimeric AroQ δ CMs strongly resemble AroQ α consisting of two protomers of three α-helices each.While AroQ δ by itself shows <1% of the activity of a typical CM, it reaches its full potential upon forming a complex with its partner enzyme, the DAHP synthase (10).Complex formation allows for inter-enzyme allosteric control of CM activity via binding sites for the three aromatic amino acids on the DAHP synthase (DS) (17).In M. tuberculosis, the binding of Phe and Tyr inhibit intracellular CM activity by 72% and 37%, respectively (10).In the CM-DS enzyme system from C. glutamicum, metabolic feedback regulation by the aromatic amino acids is even more pronounced (18).
AroQ γ subclass CMs are very peculiar in that they possess an N-terminal signal sequence for export from the cytoplasm, the compartment where the shikimate pathway is localized.Exported CMs (*CMs) are mainly found in bacteria, but are also present in some fungi and nematodes (19)(20)(21).The best characterized AroQ γ family member is the exported CM from M. tuberculosis (*MtCM).It forms a homodimer, with each protomer consisting of six α-helices.While the four N-terminal helices contribute to the active site, a disulfide bond in the Cterminal part between helices five and six is thought to be essential for structural integrity when exported out of the cell (9).Notably, no feedback regulation was detected for *MtCM (19,22).Since biosynthesis of intracellular metabolites outside of the cell is not plausible, AroQ γ enzymes were hypothesized to be candidates for virulence factors, especially for mammalian pathogens, as they are also found in Salmonella enterica serovar Typhimurium, Pseudomonas aeruginosa, and Yersinia pestis (13,19,(22)(23)(24).This hypothesis is supported by the finding that *CMs from several phytopathogenic bacteria, fungi, and nematodes are involved in host invasion (20,(25)(26)(27)(28)(29).
P. aeruginosa also produces an exported CDT enzyme, PheC, which in contrast to its cytoplasmic PDT counterpart (13), is not feedback regulated (30)(31)(32).PheC consists of two α/β subdomains connected by two flexible hinge strands, with the ligand binding site located at the interface of the two subdomains.Based on structural homology, PheC probably evolved from periplasmic solute-binding proteins and was reported to oligomerize to homotrimers (32).Since P. aeruginosa also produces an exported CM and maybe even an exported aromatic amino acid aminotransferase, which together could form a complete metabolic path from chorismate to Phe (13), it has been speculated that an excess of internally produced chorismate escaping the cell is captured and converted by this so-called "hidden overflow pathway" (32,33).However, the biological role of these exported enzymes is still controversially discussed.
Here we report the discovery of novel and highly rare exported bifunctional fusion enzymes with *CM and CDT domains, which appear to exist solely in non-pathogenic bacteria and should thus have another function than virulence.Extensive kinetic, biophysical, and structural characterization and genetic neighborhood analysis are employed to gain insights into their functioning and shed some light on their biological role.

Highly rare, exported fusion enzymes with CM and CDT domains
The full amino acid sequence of the AroQ γ enzyme *MtCM (19), including its N-terminal secretion signal (UniProt: P9WIB9), was taken as input for a BLASTP search in the NCBI database (34).Besides other monofunctional exported CMs with >80% sequence coverage, a few search hits stood out with 50% (or less) sequence coverage due to the presence of an additional protein domain.A further BLASTP search of this other domain revealed that it shared >40% identity with the exported cyclohexadienyl dehydratase PheC of P. aeruginosa (subsequently also referred to as *PaCDT).Furthermore, two distinct topologies of these exported bifunctional fusion enzymes were found; either the CM domain is N-terminally fused to a C-terminal cyclohexadienyl dehydratase domain (*CMCDT) or vice-versa the CDT domain is N-terminal and the CM domain C-terminal (*CDTCM).The novel exported fusion proteins appear to be extremely rare and are only found in very few organisms, most of which belonging to the taxonomic group of γ-Proteobacteria.These mostly aquatic species encompass Shewanella baltica (Sb), the main culprit for producing hydrogen sulfide on rotting fish (35)(36)(37), and Shewanella psychrophila (Sp) (38), further Thalassomonas actiniarum (Ta) and Thalassomonas viridans (Tv), which were both suggested to be part of the natural microflora of sea anemone (39), Aequoribacter fuscus (Af), which belongs to the family Halieaceae that accounts for more than 10% of the total ocean surface bacterioplankton (40,41), and Steroidobacter cummioxidans (Sc), a rubber degrader that secretes oxygenases to cleave polyisoprenes (Fig. 2) (42).
*CDTCM sequences are even rarer than *CMCDTs and are solely found in very few species of the class of β-Proteobacteria.These include Duganella sacchari (Ds), isolated from the rhizosphere of sugar cane (43), Massilia phosphatilytica (Mp), from a genus closely related to the genus Duganella based on 16S rRNA sequence comparisons (43) and capable of solubilizing phosphate Shikimate pathway enzymes teamed up in no man's land from rocky fertilizers (44), and Janthinobacterium sp.HH01 (Jb) from a biofilm-forming family known to synthesize antibacterial and antifungal compounds (45).
The clear division between exported *CMCDTs occurring only in γ-Proteobacteria and exported *CDTCMs only found in β-Proteobacteria (Fig. 2) speaks against horizontal gene transfer between the two taxonomic groups with respect to these exported bifunctional fusion enzymes.More likely, these genes evolved from different and independent primordial protein fusion events, by convergent evolution.
Additionally, the strictly conserved Cys residues forming a disulfide bond in *PaCDT are present in all CDT domains of the bifunctional fusion enzymes.In contrast, the two cysteines, which form a disulfide bond in *MtCM, are not conserved in the CM domains.The disulfide bond in *MtCM is located near the C-terminus in a segment that shows the highest sequence diversity among the bifunctional fusion enzymes (Figs. 3 and 4).
The genes of seven bifunctional fusion enzymes were custom-synthesized (Fig. S2) and cloned into plasmids for expression in E. coli for further investigation.Of the Shewanella and Thalassomonas genera, only the variants from S. baltica and T. actiniarum were further analyzed.

Complementation of CM and CDT-deficient mutant strains of E. coli by bifunctional fusion enzymes
The amino acid sequences of the four exported *CMCDTs (*ScCMCDT, *AfCMCDT, *TaCMCDT, and *SbCMCDT) and the three exported *CDTCMs (*JbCDTCM, *DsCDTCM, and *MpCDTCM), including their N-terminal natural signal peptides and an appended C-terminal His-tag, were reverse translated and codon-optimized for gene expression in E. coli.The custom-synthesized genes were cloned (without their signal sequence) into the expression vector pKTCTET-0 (46) for an initial assessment of heterologous CM and CDT activities in vivo inherent to the fusion proteins.
The in vivo complementation assay was carried out in E. coli strain KA12 (47), which is devoid of the two endogenous genes encoding the bifunctional CM-prephenate dehydrogenase (PDH) and CM-prephenate dehydratase (PDT, a cyclohexadienyl dehydratase specific for prephenate), thus rendering KA12 auxotrophic for growth in minimal medium M9c (46) that lacks Phe and Tyr.When KA12 is provided with the helper plasmid pKIMP-UAUC (48), which encodes a monofunctional PDH from Erwinia herbicola (truncated TyrA) and the CDT of P. aeruginosa (PheC/*PaCDT), only the CM activity is missing to restore growth in M9c medium.However, if KA12 is provided with the helper plasmid pKIMP-UA, which only encodes the truncated TyrA having only PDH activity, growth in M9c requires both an active CM and CDT.
Single colonies of KA12/pKIMP-UAUC and KA12/pKIMP-UA, additionally transformed with a pKTCTET expression plasmid that carries a gene for the fusion proteins, were streaked out onto different M9c minimal agar plates.The plates contained different concentrations of tetracycline, the inducer of the promoter on pKTCTET, to probe for the strength of the in vivo complementation.Fig. S3 shows that all bifunctional enzymes complemented the CM deficiency in KA12/pKIMP-UAUC cells, producing well-visible colony growth within 4 days on M9c with 200 ng/ml tetracycline, except for *TaCMCDT that grows much slower than all other clones.Since the CM reaction precedes the CDT reaction, the growth in KA12/pKIMP-UA transformants is also dependent on the activity of the CM domain.In fact, the observed growth phenotypes generally parallel those for the KA12/pKIMP-UAUC cells (Fig. S3).Impaired complementation of the combined CM+CDT deficiency was found for *SbCMCDT, particularly at a low gene induction level.After 5 days at 37 C, all bifunctional fusion enzymes essentially fully complemented the CM+CDT deficiency in vivo at least with 200 ng/ml tetracycline, except for *TaCMCDT lacking any CDT activity.

Bifunctional fusion enzymes show high catalytic efficiency in vitro
The catalytic activity of the bifunctional fusion enzymes was next investigated by kinetic activity measurements in vitro.The pKTCTET plasmids encoding the leaderless bifunctional fusion enzymes were transformed into the E. coli production strain KA29 (19), which harbors a chromosomally encoded T7 RNA polymerase for high transcription levels from the T7 promoter on pKTCTET-0 (46).Besides lacking CM and PDT genes to exclude a priori contamination with endogenous enzyme activity, KA29 is deficient in the thioredoxin B (trxB) gene.This renders the cytoplasm more oxidative and facilitates disulfide bond formation, which is expected to occur in the CDT domain.Cytoplasmically overproduced bifunctional fusion enzymes were purified via their C-terminal His-tag using Ni 2+ -NTA affinity chromatography.The eluates were analyzed by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), which indicated high integrity and purity (Fig. S4).All fusion enzymes could be produced in E. coli with good yields as soluble proteins, except for *TaCMCDT.The observation that *TaCMCDT remained insoluble may explain the impaired in vivo complementation in both KA12/pKIMP-UAUC and KA12/pKIMP-UA by this variant.Further in vitro kinetic studies were thus only performed with the six soluble bifunctional fusion enzymes.
The bifunctional fusion enzymes were subjected to measurements of their isolated CM and CDT activities as well as the coupled CM+CDT kinetic activity.The derived Michaelis-Menten kinetic parameters are listed in Table 1 with the data plots shown in Fig. S5.
All bifunctional fusion enzymes confirmed CM and CDT activities in vitro.Strikingly, all CM kinetics revealed a very low K m <7 μM, which is at least 25 times lower than the K m of 180 μM from *MtCM.Thus, the CM domains of the fusion enzymes can effectively bind very low concentrations of chorismate.The low K m values are coupled to slower rate constants, with k cat <7 s −1 or <26 s −1 determined for the *CMCDTs or the *CDTCMs, respectively, which is 2 to 25 times lower than the k cat of 50 s −1 reported for *MtCM (19).This suggests that the low K m values may come at a cost.With the extraordinarily low K m values, the catalytic efficiencies k cat /K m of the CM domains from the bifunctional fusion enzymes reached 1 to 4 × 10 6 M −1 s −1 for the *CMCDTs and 5 to 8 × 10 6 M −1 s −1 for the *CDTCMs and are an order of magnitude higher than k cat / K m of 2.7 × 10 5 M −1 s −1 for *MtCM (19).
The kinetic parameters of the CDT domains from all bifunctional enzymes are similar to those of the monofunctional reference enzyme *PaCDT.Individual rate constants and Michaelis constants for the bifunctional fusion enzymes vary from 12 to 42 s −1 and 3 to 40 μM, respectively, but generally   are in the same order of magnitude as those of *PaCDT with a k cat of 18.4 s −1 and a K m of 18.7 μM.All catalytic efficiencies lie around 10 6 M −1 s −1 , and no differences in CDT activity are apparent between the *CMCDT and *CDTCM variants.
The coupled CM+CDT kinetic measurement allowed first insights into whether the domain fusion influences the catalytic activity, as it simultaneously measures the two consecutive conversions from chorismate via prephenate to phenylpyruvate.In the absence of a domain fusion effect, the poorer of the individual kinetic parameters from both catalytic steps would limit the coupled reaction velocity and reveal the lower k cat and higher K m of either individual domain.
The results of the coupled CM+CDT assay for the *CMCDTs demonstrate that the rate constant (k cat ) of the CM step was limiting, whereas the Michaelis constant (K m ) was higher than for each of the individual steps for *ScCMCDT and *SbCMCDT.In contrast, *AfCMCDT exhibited a K m of 12 μM for the coupled CM+CDT activity, which is lower than the measured K m of 40 μM for CDT activity alone.This suggests that substrate saturation for the coupled CM+CDT reaction is reached at a lower substrate concentration.However, this effect could be explained by a lag or transient phase, which can occur in coupled enzyme reactions, that lasts until steady-state conditions for the downstream enzymes are reached (49)(50)(51)(52).
A similar observation was made for the *CDTCM variants.Also, this inverse format did not reveal an increased catalytic rate constant k cat for the coupled CM+CDT assay compared to the single CM or CDT assays.The K m values of the coupled CM+CDT assays lie between those for the individual catalytic steps, like for *AfCMCDT.Remarkably, the coupled CM+CDT reactions exhibited lower k cat values compared to the singly measured CM and CDT activities, suggesting that neither of the domains plays a dominant role in limiting catalysis.It also means that tethering the enzyme domains does not provide a catalytic benefit, which could plausibly stem from, for instance, a direct transfer of the first reaction's product to the substrate binding site for the second process.

Investigation of quaternary structures in solution
Structural insight into the relative topologies of the catalytic centers of a fusion enzyme is essential to study direct interactions that may, for instance, facilitate the rebinding of prephenate released from the CM domain by the CDT domain.Close spatial interactions would be possible by either having the CM and CDT active sites next to each other within the same folded single polypeptide or upon head-to-tail dimerization of two bifunctional proteins.To address the latter, we investigated the quaternary structure of selected bifunctional fusion proteins in solution.
Homodimer formation was reported for *MtCM (9), and homotrimer formation for *PaCDT (32).To investigate whether the exported bifunctional fusion enzymes form oligomeric structures in solution, size-exclusion chromatography (SEC) was performed with all variants included in this investigation except for *SbCMCDT and *TaCMCDT, as they did not yield sufficient soluble protein for SEC analysis.
All bifunctional enzymes eluted in a large prominent peak at around 15 ml that corresponds to the molecular mass of a monomer (Fig. 5).In addition, a small peak was observed between the 150 kDa and 44.3 kDa markers of the protein standard, corresponding in mass to a dimer.This peak was detected for all *CDTCM variants (Fig. 5B; red arrowhead) and possibly for *ScCMCDT, which displayed a small hump at this elution volume (Fig. 5A).There is no indication of dimer formation of *AfCMCDT; however, we noted a righthand shoulder of the main peak (Fig. 5A), which could be due to conformational isomers, e.g., due to partial unfolding.We presume that the wide shoulder eluting ahead of the 670 kDa protein standard for *AfCMCDT and *DsCDTCM (Fig. 5, A and B) is caused by non-specific protein aggregation in these samples, which may involve the formation of non-native disulfide bonds.This is supported by an SDS-PAGE analysis of the higher-molecular weight SEC fraction of *DsCDTCM, where the addition of DTT drastically reduced the number and intensity of bands in the highmolecular-weight region of the gel to a dominant band corresponding to the monomer (data not shown).The small dimer peak observed for most of the bifunctional fusion enzymes could also be explained by a general aggregation propensity.

Crystal structures of exported bifunctional fusion enzymes
To investigate the relative locations of the active sites within a single polypeptide, *AfCMCDT, *JbCDTCM, and   S1).Superimposition of *MtCM (Protein Data Bank (PDB) ID: 2FP2) (9) and *PaCDT (PDB ID: 5HPQ) (32) structures onto the CM and CDT domains of *AfCMCDT (PDB ID: 8CQ3; this work), *JbCDTCM (PDB ID: 8CQ4; this work) and *DsCDTCM (PDB ID: 8CQ6; this work) revealed highly similar structures in the bifunctional fusion enzymes (Fig. 6).Like *MtCM, the CM domains of *AfCMCDT, *JbCDTCM, and *DsCDTCM form six-helix bundles with a single active site in the N-terminal moiety.The CDT domains comprise two α/β subdomains that are connected by two antiparallel hinge βstrands, crossing between the sub-domains twice, as known for *PaCDT (32).Also the structures of *DsCDTCM and *JbCDTCM are very similar, both overall and in the active site (Fig. S6), with r.m.s.d = 1.4 Å (C α atoms).We noticed that *JbCDTCM contained a 2-(N-morpholino)ethanesulfonic acid (MES) molecule from the buffer in its CDT active site (Figs.6, S6C and S7C).*AfCMCDT captured acetate molecules from the crystallization buffer in both the CM and CDT active sites (Fig. S7, A and B).

Active sites
*JbCDTCM and *DsCDTCM share the same residues in both CM and CDT active sites (Fig. S6, B and C).Therefore only *JbCDTCM will be discussed as representative of the *CDTCM topology.Six of ten active site residues of *MtCM are conserved in the CM domains of *AfCMCDT and *JbCDTCM (Fig. 7A) (9).Compared to *MtCM, the active site residues in *CDTCM and *CMCDT fusion enzymes differ at several positions, e.g., Q76V and T105S or T105A (in *JbCDTCM or *AfCMCDT, respectively), E106K, and E109Q (Fig. 7A).In *MtCM, Gln76 and Thr105 help to coordinate the ligand's carboxylates (9), which is no longer possible upon substitution with Val and Ala, respectively, as in *AfCDTCM, whereas the switch from Thr105 to a Ser in *JbCMCDT is more conservative and could still provide a hydrogen bond donor to the carboxylate via its hydroxyl group.Likewise, substituting Glu106 with Lys could, together with some conformational adaptation of the side chain, preserve coordination of the ligand's hydroxyl group (shown for *MtCM) (9), and substituting Glu109 with Gln could accomplish coordination of the ether oxygen via its amide hydrogen instead of the carboxyl group.However, by replacing Glu106 and Glu109 with positively charged or neutral residues, the charge balance of the active site is affected.Importantly, this likely explains why the fusion enzymes exhibit lower rate constants k cat and increased substrate affinity (lower K m ) in the CM kinetic assays, compared to *MtCM (Table 1).a Values in parentheses refer to highest resolution shell.b R free was calculated from 5% of randomly selected reflections for each data set.

Shikimate pathway enzymes teamed up in no man's land
In contrast to the analyzed CM domains, all nine residues forming the active site in *PaCDT (32) are fully conserved not only in the CDT domains of *AfCMCDT and *JbCDTCM (Fig. 7B), but in all investigated exported bifunctional fusion enzymes (Figs. 3 and 4).The CDT domains of the three bifunctional enzymes show very high structural similarity to *PaCDT, and congruently the determined CDT kinetic parameters for *AfCMCDT and *JbCDTCM are highly similar to those of *PaCDT (Table 1).
In the active site of the *JbCDTCM structure, we discovered an MES molecule (Fig. S7C).MES resembles the CDT product, phenylpyruvate (Fig. 7C), and thus allows for the first glimpse into enzyme-ligand interactions.The carboxylate group of phenylpyruvate is replaced by a sulfonate group in MES, and the aromatic ring by a morpholine ring (Fig. 7C).The sulfonate group of MES forms hydrogen bonds with the hydroxyl group and backbone amide of Ser107 and Thr159 in *JbCDTCM and interacts by charge complementarity with the side chains of Arg112 and Lys225 (Figs. 7B and S6C).Water-mediated protein interactions of the CDT domain with the MES sulfonate group involve Lys127, Asp192, Asn155, and Thr159.The morpholine ring is stabilized by interactions with the side chains of Tyr50 and Trp87.The oxygen atom in the morpholine ring is hydrogen-bonded to the side chain amide nitrogen of Asn179 (Figs. 7B and S6C).
Since MES seemed to mimic the product phenylpyruvate of the CDT reaction, we tested whether MES could act as *JbCDTCM inhibitor.However, as shown in Fig. S8, MES has no inhibitory effect on CDT activity within the tested concentration range from 20 μM to 20 mM.

No structural evidence for substrate gating and channeling
Since fused enzymes often exhibit cooperation between active sites, we were interested in the relative placements of catalytic sites within the bifunctional proteins.In *AfCMCDT, representing the *CMCDT topology, the presumed substrate access to the CDT domain faces away from the expected product exit site of the CM domain (9) (Fig. 6, A and D), with a distance of 35 Å between active sites.In the two *CDTCMs, the distances between CM and CDT active sites are similarly large, measuring approximately 40 Å (Fig. 6, B, C, and E), thus  and S9, A and B).The buried surface area of this *DsCDTCM dimer was estimated to be 3070 Å 2 , with the free energy of dimer dissociation being 5.8 kcal/mol, as calculated using PDBePISA (53).The dimerization interface is mainly lined with hydrophilic and charged residues (Fig. S9, C and D).Small humps reminiscent of *CDTCM dimers in sizeexclusion chromatography (Fig. 5B) could be explained by the transient formation of such dimers in the protein preparations, in line with predictions that complex formation requires a minimum buried surface area of 1200 Å 2 (54,55).
The crystal structure of *DsCDTCM head-to-tail dimers (Fig. 6C) allowed us to address possible intermolecular substrate channeling between domains of different protomers in a protein complex.The active sites of the CM and CDT domains of the two individual chains are separated by a substantial distance of about 40 Å and open toward opposite directions (Fig. 6, C and F).The surface charge distribution (Fig. 6, D-F) does not reveal any pattern for a potential intra or intermolecular electrostatic relay of intermediates from the CM to the CDT active site for any of the three structures.More specifically, there is no apparent path of positively charged residues that would suggest facilitated crawling of the doubly negatively charged prephenate along the protein surface between the active sites of the two domains.Taken together, the crystal structures do not provide evidence for intra or intermolecular substrate channeling for the bifunctional fusion enzymes investigated here.

Interactions at the CM and CDT domain interface
Whereas CDT domain superimpositions with *PaCDT revealed no prominent structural differences (Fig. 6, A-C), the two C-terminal helices H5 and H6 of the CM domains in *CMCDT and *CDTCM align with slightly different angles, with *CDTCMs more closely matching *MtCM (Fig. 8A, exemplified with *JbCDTCM).This suggests that C-terminal fusion to CDT may affect the CM structure.In *MtCM, a disulfide bond connects these two C-terminal helices, and it was pointed out previously that this crosslinking is important for the enzyme's stability outside the cell (56).
A full structural assessment of the domain interface in *AfCMCDT is precluded by the poorly resolved linker region (Fig. 8B).In fact, there is ambiguous electron density for residues Pro177-Asp180 (Fig. S10A).This may suggest a more flexible connection of the CM and CDT domains for *AfCMCDT than for the topological counterparts *JbCDTCM and *DsCDTCM, for which the linker regions are clearly resolved (Fig. S10, B and C).Still, residues Asp164 and Lys168 from H6 of the CM domain of *AfCMCDT interface extensively with the CDT domain (Fig. 8B).In *JbCDTCM, CM helices H1 and H5 interact with the conserved "β-hairpin extrusion" of CDT (Fig. 8C).This structural interaction in *CDTCMs appears to be fairly rigid, as the linker segment contains many aromatic and hydrophobic residues that are shielded from the aqueous environment.Furthermore, the hydrogen bonding network from residues from CM helix H5 to the linker backbone and from helices H1 and H5 to the β-hairpin extrusion of the CDT domain, also involving several water molecules, shields the hydrophobic linker (Fig. 8C).It thus appears that the rigid domain interface in *CDTCMs assumes a stabilizing role for the secreted fusion enzyme, thereby taking over the function of the stabilizing disulfide bonds previously observed in secreted CMs (9,19).

No kinetic evidence for substrate gating and channeling from designed active site knock-out and split-domain variants
To shed light on potential functional linkages between the two catalytic sites, we designed The active-site KO variants were generated by introducing a single residue substitution in either the CM or the CDT domain.In E. coli PheA (EcCM), Lys39 was reported to be essential for CM activity (7,57), which was confirmed by a K39A substitution variant that showed 10 4 -10 5 -fold lower k cat /K m (58).Lys39 in EcCM corresponds to Lys60 in *MtCM (56), which was used to allocate the corresponding catalytic residue for substitution with Ala in *AfCMCDT (K48A) and *JbCDTCM (K287A) based on amino acid sequence alignments (Figs. 3 and 4).Glu173 was shown to be essential for CDT activity in *PaCDT, serving as the proton donor to the departing hydroxyl group of prephenate, and a substitution with Gln led to complete inactivation of the dehydratase activity (32).Amino acid sequence alignments (Figs. 3 and 4) were consulted to design the corresponding Glu substitution with Gln in *AfCMCDT (E353Q) and *JbCDTCM (E200Q).
To decide on the split site for engineering the single domain variants, the crystal structures of *AfCMCDT and *JbCDTCM (Figs. 8 and S10) were analyzed in the context of the amino acid sequence alignment of all available natural fusion enzymes (Fig. 9).For *AfCMCDT, the domain-linking segment could not be resolved (Fig. S10A); therefore, the choice of the split site was based solely on the sequence alignment of Figure 9A.Since *AfCMCDT was the only *CMCDT variant with an elongated linker segment, no conserved feature or motif could be determined.Ultimately, the split site was placed between Leu179 and Asp180, in the middle of the linker sequence.
For the *CDTCM variants the segment linking the two domains showed a highly conserved Trp-Leu-Xaa3-Xaa4-Xaa5-Trp (WLXXXW) motif, where Xaa3 frequently contributes a negative charge (Asp or Glu), Xaa4 is always Phe or Tyr, and Xaa5 is usually Pro (Fig. 9B).Residue Xaa5 also forms the start of the first α-helix in the CM domain.Because the crystal structure of *PaCDT (PDB ID: 5HPQ) (32) could only be resolved to Leu233 (just C-terminal to the first Trp in the linker segment), no structural information could be obtained from *PaCDT to help determine the split site for fully preserving the intact CDT domain.In the *JbCDTCM and *DsCDTCM structures presented here, we observed welldefined electron density for the linkers, with interactions of the two Trp residues at both ends (see the linker region in Fig. 8C in green; Fig. S10, B and C).Ultimately, the split site after the CDT domain was positioned at the end of the linker sequence, between Trp237 and Gly238, based on our new structural insights and the fact that the whole linker segment is conserved in all aligned CDTs.All active-site KO variants as well as *AfCDT and *JbCM single domains were produced with a C-terminal His-tag, whereas *AfCM and *JbCDT single domains were cloned with an N-terminal His-tag.Expression of the reengineered genes in E. coli strain KA29 resulted in soluble protein in good yields for all constructs, except for *AfCDT (little soluble protein) and *JbCDT (no soluble protein) (Fig. S4).
The CM activities of *JbCMCDT, *JbCM, and *AfCMCDT match well with the corresponding catalytic parameters of the wild-type parent enzymes (Table 3 and Fig. S11, A and B).Whereas the kinetic curves of all variants derived from *JbCDTCM fully overlap, the split-off *AfCM domain showed a slightly higher k cat (2.9 s −1 ) and a twofold higher K m (5 μM) than wild-type *AfCMCDT (k cat = 2.2 s −1 , K m = 2 μM).However, because the limited sensitivity of the employed discontinuous kinetic assay precluded using lower substrate concentrations to better bracket the K m , the observed small differences compared to *AfCMCDT are probably insignificant.A slight structural perturbation caused by removing the CDT domain might also explain the somewhat lower k cat /K m of *AfCM.Knocking out the anticipated active site Lys in *AfCMCDT (K48A) and *JbCDTCM (K287A) fully abolished CM activity, providing experimental evidence for the predicted active site mechanism (Table 3 and Fig. S11, A and B).
*AfCMCDT and *JbCDTCM exhibit the same k cat values for their CDT activities as their wild-type parents (Table 3 and Fig. S11, C and D).Interestingly, the K m determined for *AfCMCDT (18 μM) and *JbCDTCM (4 μM) was approximately twofold lower than for *AfCMCDT (40 μM) and *JbCDTCM (7 μM), respectively (Table 3), leading to a twofold higher catalytic efficiency k cat /K m .This may indicate that the native CM domain provides a competing binding site for the substrate prephenate, which would lower the local substrate concentration for the CDT reaction, showing up as a comparatively lower apparent CDT activity of the wild-type formats.
Finally, the split-domain *AfCDT exhibited lower k cat (20 s −1 ) and K m (28 μM) values, but essentially the same catalytic efficiency as its wild-type counterpart (Table 3).However, as for *AfCM, structural perturbance due to the missing protein domain could be the cause for the observed change in the kinetic parameters of *AfCDT.Notably, no activity was measurable anymore after exchanging the presumed active site Glu353 and Glu200 with Gln in *AfCMCDT and *JbCDTCM, respectively, thus experimentally confirming the crucial importance of this active site residue for CDT activity (Table 3 and Fig. S11, C and D).
Next, we compared the efficiency of the two-step catalytic sequence from chorismate to phenylpyruvate which is observable upon combining individual active-site KO or splitdomain variants of *AfCMCDT with variants possessing the functional missing domain (Fig. S12).The coupled CM+CDT reactions showed that equimolar concentrations of the activesite KO variants *AfCMCDT and *AfCMCDT complemented each other perfectly and reached a catalytic activity (k cat = 2.8 ± 0. 0.7 × 10 5 M −1 s −1 ) that coincides, within experimental error, with that of wild-type *AfCMCDT (Table 1).This result directly suggests that the enzyme fusion does not provide a catalytic benefit.An experiment mixing equimolar split *AfCM and *AfCDT domains revealed a lower CM+CDT activity (k cat = 1.3 ± 0.2 s −1 , K m = 20.8 ± 7.7 μM, k cat /K m = 6.9 ± 3.6 × 10 4 M −1 s −1 ).In this case, the reduced activity is owed, in part, to the twofold lower CM activity measured for the individual split *AfCM domain after cleaving off its fusion partner (Table 3), thereby possibly perturbing the structure of the active site.The inability to produce the soluble split *JbCDT domain precluded a similar set of measurements with the *JbCDTCM system.

*AfCMCDT and *JbCDTCM are not feedback regulated by phenylalanine or tyrosine
Intracellular CMs involved in biosynthetic pathways are known to be subject to feedback regulation (1-3) involving various features such as additional allosteric domains (59), dynamic dimer interfaces (16), or complexes with partner enzymes, which enable sophisticated inter-enzyme allostery (10,17,18,(60)(61)(62)(63)(64).In contrast, no feedback inhibition has been detected for the exported enzymes *MtCM (19) or *PaCDT (31,32,65).To test whether *AfCMCDT and *JbCDTCM are subject to feedback control, a coupled CM+CDT assay was performed in the presence of a large excess of either Phe or Tyr, the relevant end products of the shikimate pathway.No feedback regulation by either Phe or Tyr was observed, neither for *AfCMCDT nor for *JbCDTCM (Table 4 and Fig. S13).

Assessing the genomic neighborhood of bifunctional fusion enzyme genes
The absence of feedback regulation is, in addition to the extra-cytoplasmic localization, another indication that the exported bifunctional enzymes are not involved in housekeeping Phe biosynthesis.Moreover, searching the genomes of the nine *CMCDT or *CDTCM producer organisms for likely alternative cytoplasmic versions of these enzymes, we found that all nine organisms possess genes that potentially encode intracellular CMs of the AroQ α subclass fused to a prephenate dehydratase domain (Table S2).
To gain a better understanding of the potential biological role of these exported bifunctional enzymes, their genetic context was analyzed by using the webtool RODEO (66).RODEO stands for "Rapid Open reading frame Description and Evaluation Online" and uses the NCBI accession number of the gene of interest to compile its surrounding eight upstream and eight downstream genes in the genomic neighborhood.Gene clusters in the form of operons and closely

CM assay CDT assay
Variants Control b 2.9 ± 0.  1 and provided for comparison.aligned genes, particularly if they have the same orientation as those encoding the bifunctional fusion enzymes, may shed light on the enzymes' biological roles in the organisms.
The genomic neighborhoods of all nine exported bifunctional fusion enzymes were analyzed using RODEO (Fig. S14).In the case of *AfCMCDT, four genes were allocated upstream with the same orientation, of which the nearest two code for a putative peroxidase and a hydrolase.The gene for *JbCDTCM is directly flanked by two genes in the same orientation to form an operon, where the gene upstream encodes a putative pyruvate carboxylase (HMGL-like) and downstream a putative lumazine-binding domain.These findings hint at potential roles of the bifunctional fusion enzymes in processes involving substrate degradation and binding.In the neighborhood of the genes for the other exported bifunctional fusion enzymes, an ABC transporter and a TonB receptor were found for *ScCMCDT, suggesting involvement in substrate uptake.An ATPase and another solute-binding protein are encoded directly downstream of the genes for *TaCMCDT and *TvCMCDT, whereas for *SbCMCDT and *SpCMCDT genes, no flanking coding regions with the same orientation were found.Interestingly, an intramembrane protease and an AI-2E family transporter are encoded by the two genes in the same direction and directly upstream of the one for *DsCDTCM, which suggests a role in signal recognition.Directly upstream of the *MpCDTCM gene is another substrate-binding protein encoded with the same orientation.In summary, we noticed a considerable proportion of genes for substrate degradation and, particularly, for uptake and transport around the genes for the exported bifunctional fusion enzymes.Together with the discovery of an AI-2E transporter gene in the *DsCDTCM genomic neighborhood, encoding a transport protein for autoinducer 2 (AI-2)-type molecules, which are responsible for signaling and interspecies communication (67,68), our analysis suggests that these exported bifunctional fusion enzymes may play a role in quorum sensing or interbacterial interactions.

Discussion
Nine novel, very rare exported bifunctional fusion proteins consisting of an AroQ γ CM domain and a CDT domain were discovered in a few γand β-proteobacterial species.Two distinct topologies were observed that divide these fusion enzymes into two distinct classes.In γ-Proteobacteria, the CM domain is fused N-terminally to the CDT domain (*CMCDT), whereas in β-Proteobacteria, CM constitutes the C-terminal domain (*CDTCM).
Seven of the nine exported bifunctional fusion enzymes were further investigated, and all showed CM and CDT activities in E. coli and in kinetic assays in vitro, except for *TaCMCDT, which showed no CDT activity at all in an in vivo complementation assay.Since *TaCMCDT could not be solubly overproduced in E. coli KA29 cells in spite of great efforts taken (data not shown), no reliable in vitro data could be generated for this enzyme.The low solubility is most likely also the explanation for the poor in vivo complementation (Fig. S3).Production of the single split-off CDT domains yielded very little soluble protein for *AfCDT and none for *JbCDT, suggesting that the CDT domain benefits from the fused CM domain for stability in solution.The exposed interface to the partnering CM domain, possibly around the linker sequence, is probably the culprit for the low solubility after splitting the protein, given that *PaCDT with 40 to 60% sequence identity to the CDT domains was well produced in soluble form (30,31).This is further supported by the much more intricate domain interface found in *JbCDTCM (Fig. 8C) than in *AfCMCDT (Fig. 8B), explaining why it was even more difficult to produce and purify *JbCDT compared to *AfCDT.
To gain insight into the catalytic peculiarities of the fused CM and CDT domains, in vitro kinetic assays were performed.Michaelis-Menten parameters showed that *CDTCM variants exhibited slightly higher catalytic efficiencies than the *CMCDTs for the CM reaction (Table 1), consistent with the somewhat better growth of corresponding transformants during the in vivo complementation (Fig. S3).More intriguing was the revelation that all tested bifunctional fusion enzymes showed a very low K m (<7 μM) for CM activity (Table 1), which is at least 25-fold lower than the K m of *MtCM (180 μM) (19).Even though the catalytic rate constants for the CM domains are 2 to 20-fold lower compared to *MtCM (k cat = 50 s −1 ), the catalytic efficiencies of the bifunctional fusion enzymes (k cat /K m ≈ 1-3 × 10 6 M −1 s −1 ) are still higher than that of *MtCM (k cat /K m ≈ 3 × 10 5 M −1 s −1 ).A very low K m enables efficient binding of very low concentrations of chorismate, a property that would be a requirement for detection of chorismate if this molecule served as a signal, e.g., for chemotaxis or inter-cellular communication.In comparison, the kinetic parameters for CDT activity are in the same order of magnitude and show similar catalytic efficiencies as *PaCDT (k cat /K m ≈ 1 × 10 6 M −1 s −1 ) (32).This is consistent with the high degree of sequence identity and the conservation of all nine *PaCDT active site residues (32) in the studied CDT domains.
The potential kinetic effects of the CM and CDT domain fusions were further addressed with coupled CM+CDT kinetic assays.While no catalytic benefit by the fusion was observed, it was noted that the CM domain's k cat was rate-limiting for the *CMCDT variants.For the *CDTCM enzymes, the k cat for the combined CM+CDT reaction was slower than for either of the single-step CM or CDT catalytic reactions, suggesting that another process was rate-limiting.This could be a slow transition involving the steps from prephenate release by the CM domain to rebinding in the active site of the CDT domain, possibly due to topological or dynamic constraints in the fusion enzymes, which were shown to have diverging ligand exit/entry trajectories (Fig. 6).
Additional insights were obtained from examining activesite KO and split-single-domain variants, using *AfCMCDT and *JbCDTCM as representatives for the *CMCDT and *CDTCM topologies, respectively.While CM activity was found to be unaffected compared to the wild-type fusion enzyme, a two-fold lower K m for CDT activity was observed for *JbCDTCM and *AfCMCDT compared to their wild-type counterparts.This hints at competition between the CM and the CDT active sites for prephenate binding at low substrate concentration.The fact that mixing *AfCMCDT and *AfCMCDT active-site KO variants exhibited identical activity for the two-step process from chorismate to phenylpyruvate as the parental *AfCMCDT is a strong indication that having the domains covalently tethered brings no functional advantage.Whereas *MtCM and *PaCDT are known to form oligomers (9,19,32), we show here by SEC that the bifunctional fusion enzymes are predominantly monomeric.This observation is supported by the protein crystal structures of *AfCMCDT and *JbCDTCM.Moreover, the crystal structures of all three bifunctional fusion enzymes presented here give no indication of substrate channeling.Taken together, there does not appear to be any catalytic benefit of the fused domains in these bifunctional enzymes.
Domain fusions can have other benefits.For instance, it will facilitate coordinated gene expression or protein degradation (69,70).Regulatory mechanisms to ensure stoichiometric concentrations of the active sites of the two sequential CM+CDT reactions must be put in place only once for the fusion protein, rather than for each enzyme separately.Also, both functions co-localize to the periplasm and the transfer across the membrane can occur simultaneously for CM and CDT.Even though not apparent in our kinetic assay, having co-localized active sites may be an advantage to sequester intermediates (here prephenate) from unwanted conversion or side reactions by competing enzymes.Our discovery that the fusion between a CM and a CDT occurred twice in evolution, in two different topologies and in two independent taxonomic classes of Proteobacteria, is a strong indication that it must be beneficial for the producer organism.
It is still an enigma why these coupled enzymes are exported from the cytoplasm into a compartment where there is neither an obvious metabolic source for the substrate chorismate nor use of the products prephenate and phenylpyruvate.In fact, the role of these bifunctional fusion enzymes in the biosynthesis of aromatic amino acids is extremely unlikely; for one, they are exported to the periplasmic space and thus distant from the biosynthetic shikimate pathway (1,2).Moreover, they are not subjected to feedback regulation by the metabolic end products Phe or Tyr, which is otherwise ubiquitous in intracellular shikimate pathway enzymes (1-3, 10, 16-18, 59-62).Furthermore, there are cytoplasmic alternatives present in each organism for Phe biosynthesis (Table S2).For exported CMs of the AroQ y subclass, an alternative explanation to the involvement in central metabolism is a contribution to pathogenesis (19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29).However, since a pathogenic lifestyle of the bacterial producers of the exported fusion enzymes is not likely, these enzymes must play a different role.
The particularly low K m values for the CM domains, which allow efficient binding of very low concentrations of the substrate chorismate, may hint at their true biological task.The genomic localization together with genes for solute-binding and transport proteins, as well as a presumed AI-2 transporter, suggests an involvement of the bifunctional fusion enzymes in processes such as chemotaxis, quorum sensing or interspecies communication (67,71,72).This is supported by the hypothesis that secreted CDTs had evolved from an ancient periplasmic binding protein (32), which is a potential nutrient, repellent or other metabolite sensor protein that can interact with the bacterial chemotaxis machinery.The evolution of catalytic CDT function in a periplasmic binding protein specific for phenylpyruvate would allow the expansion of sensing to prephenate.The addition of a secreted CM then extends the signaling cascade further to integrate the detection of free chorismate present in the environment.Sensing chorismate, a normally strictly intracellular metabolite, could signal to the owner of the exported fusion enzymes the presence of lysed bacteria.Chorismate, which by itself is an inherently unstable metabolite (73), should be a formidable, self-resetting primary messenger.Its instantaneous detection, followed by triggering an appropriate behavioral response in the sensing organism, may well constitute an important advantage within complex bacterial habitats.We postulate that the bifunctional fusion enzymes characterized here may be part of such systems.

Materials and general procedures
Chorismate was produced by a published procedure (74).Prephenate was prepared from 10 mg of commercial prephenic acid barium salt (>75%) (#P2384-10 MG, Sigma-Aldrich/ Merck, Darmstadt, Germany), which was dissolved in 400 μl 50 mM HEPES, pH 7.0, containing 120 mM K 2 SO 4 .The precipitating BaSO 4 was removed by centrifugation with 15,000g for 5 min at 4 C and the supernatant was transferred to a microtube and snap frozen for storage at −80 C. All other chemicals were purchased from Sigma-Aldrich/Fluka (St Louis, MO, USA).DNA manipulations were performed by standard procedures (75) or Golden Gate Assembly (GGA) cloning (details in the Supporting Experimental Procedures).All DNA polymerases, DNA ladders, restriction endonucleases, and ligases were purchased from New England Biolabs (Ipswich, MA, USA).Sanger DNA sequencing and oligonucleotide synthesis were performed by Microsynth (Balgach, SG, Switzerland).

Search for exported bifunctional fusion enzymes
The exported bifunctional fusion enzymes consisting of an AroQ γ subclass CM domain and a cyclohexadienyl dehydratase domain were discovered by BLASTP (76) using the fully translated open reading frame Rv1885c (*MtCM; Uni-Prot: P9WIB9) (19) including the signal peptide.The bifunctional fusion enzymes were identified by 50% query coverage but with high sequence similarity and identity scores.Only search hits with full taxonomic classification to the species level were considered.The initially identified fusion enzymes were thereafter used for further BLASTP searches to find all nine final bifunctional fusion enzymes (for protein accession numbers, see Fig. S14).
The amino acid sequences of the bifunctional fusion enzymes were reverse translated and codon-optimized for expression in E. coli using CLC Genomic Workbench v9.01 (QIAGEN CLC bio, Aarhus, Denmark).Furthermore, undesired restriction sites were removed manually by introducing silent mutations, and flanking restriction sites were added for cloning.A list of all nucleotide sequences can be found in the Supporting Information (Fig. S2).Gene synthesis was performed by GenScript (Piscataway, NJ, USA) or TWIST Bioscience (South San Francisco, CA, USA).

Bacterial strains and plasmids
E. coli strains KA12 (47,48) and KA29 (19) and plasmids pKTCTET-0 (46) and pKIMP-UAUC (48) were previously described.The construction of pKTCTET derivatives encoding the bifunctional fusion enzymes with and without their natural signal sequences as well as the single-domain and active-site knockout variants of *AfCMCDT and *JbCDTCM are detailed in the Supporting Experimental Procedures.

Protein overproduction and purification by Ni 2+ -NTA affinity chromatography
Luria Bertani (LB) starter cultures containing 100 μg/ml sodium ampicillin (Amp 100 ) and 50 μg/ml kanamycin (Kan 50 ) were prepared of KA29/pKTCTET derivatives for the appropriate cytoplasmatic bifunctional fusion enzyme variant and grown at 37 C, 230 rpm, overnight.The starter cultures were spun down and the pellet inoculated into fresh medium to remove β-lactamases released into the supernatant before 800 ml LB production cultures containing 180 μg/ml sodium ampicillin (Amp 180 ) and Kan 50 were inoculated at a ratio of 1:100.The production cultures were incubated at 20 C, 160 rpm, for 2.5 days, then gently pelleted at 2000g for 10 min and resuspended in fresh 800 ml LB containing Amp 180 and Kan 50 .The cultures were incubated for another 24 h at 20 C, 160 rpm.To harvest the cells the cultures were cooled on ice for 15 min and the cells pelleted by centrifugation at 4 C with 4000g for 15 min.The supernatant was discarded.
The pellets were resuspended in Sonication Buffer (20 mM Tris-HCl, pH 8.0, containing 150 mM NaCl).Lysozyme was added to a final concentration of 1 mg/ml and the suspension was incubated on ice for 30 min.The cells were lysed by sonication (3 rounds each with 100% amplitude, full cycle; 2 min sonication and 2 min breaks in an iced water bath, using a UP 200 s tip, Sonotrode S7; Dr Hielscher GmbH, Teltow, Germany) and the lysates were centrifuged at 4 C with 20,000g for 20 min.The cleared supernatant was transferred to a beaker and Elution Buffer 2 (50 mM Tris-HCl, pH 8.0, containing 150 mM NaCl and 500 mM imidazole) was added to reach a final concentration of 20 mM imidazole.6 ml of Ni 2+ -NTA slurry (Qiagen, Venlo, LI, Netherlands) were equilibrated in a gravity flow column with Wash Buffer (50 mM Tris-HCl, pH 8.0, containing 150 mM NaCl and 20 mM imidazole).The supernatant was loaded onto the metal affinity column and let flow through.The Ni 2+ -NTA bed was excessively washed with Wash Buffer before elution of metal-bound protein with 16 ml of Elution Buffer 1 (50 mM Tris-HCl, pH 8.0, containing 150 mM NaCl and 250 mM imidazole).The elution fractions were dialyzed using SnakeSkin tubing (Thermo Fisher Scientific, Waltham, MA, USA) with a molecular weight cutoff below the calculated mass of the protein.Dialysis was performed in 5 L Dialysis Buffer (20 mM Tris-HCl, pH 8.0, containing 150 mM NaCl) overnight at 4 C, followed by a second dialysis step in a fresh 5 L Dialysis Buffer for 4 to 6 h at room temperature.The dialyzed protein was sterile filtered (0.22 μm), stored at 4 C for direct use or snap-frozen in liquid nitrogen for storage at −80 C.
Enzyme concentrations [E] were determined using the Bradford assay (77).The expected molecular masses were confirmed by electrospray ionization mass spectrometry (ESI-MS) at MoBiAS (Laboratory of Organic Chemistry, ETH Zurich) with mass data reported in Table S3.

Size-exclusion chromatography
Analytical size-exclusion chromatography (SEC) was performed using the BioLogic DuoFlow system with a QuadTec UV/Vis Detector (BioRad Laboratories Inc, Hercules, CA, USA) and the Superdex 200 increase 10/300 GL column (Cytiva, Little Chalfont, UK).20 mM Tris-HCl, pH 8.0, containing 150 mM NaCl degassed and prechilled to 4 C was used to equilibrate the whole system and as a running buffer.0.8 to 1.3 mg of samples were injected in a volume of 0.6 ml at a flow rate of 0.5 ml/min for injection and passed through the column with 25 ml running buffer at a flow rate of 0.7 ml/min.The monitored absorbance at 214 nm (A 214nm ) was used to plot elution profiles of the protein standard mix Supelco Protein Standard Mix 15 to 600 kDa (Sigma-Aldrich/Merck) and the analyzed protein samples.

CM, CDT, and coupled CM-CDT in vitro kinetic assays
For the final read-out, the absorbance of phenylpyruvate after conversion to its enolate form was recorded for each the CM, the CDT, and the CM-CDT coupled discontinuous assay, as detailed in Fig. S15 and Table S4.The enolate form of phenylpyruvate exhibits a high extinction coefficient (ε) of 17,500 M -1 cm -1 at 320 nm, which allows the detection of very low phenylpyruvate concentrations.Six different substrate concentrations ranging from 2.5 to 100 μM, each at four time points between 0 to 4 min were measured.A more detailed experimental protocol can be found in the Supporting Experimental Procedures.From the four time points, the v init value for each substrate concentration was calculated, with additional correction for the spontaneous background turnover rate *DsCDTCM crystals grew at 4 C from a 1:1 (300 nl: 300 nl) mixture of 5.6 mg/ml protein in 20 mM Tris-HCl, pH 8.0, and condition G12 from the JCSG+ crystallization screening kit (Molecular Dimensions Ltd), containing 3 M NaCl and 0.1 M Bis Tris, pH 5.5, in a sitting drop 96-well SWISSCI UVXPO 2lens crystallization plate.The crystal was cryoprotected by quick soaking in freshly prepared reservoir solution complemented with ethylene glycol to a final concentration of 20% v/v ethylene glycol, flash-cooled in liquid nitrogen, and stored for data collection.

Data collection, structure determination, and refinement
Data collection for *AfCMCDT was performed at 100 K at the European Synchrotron Radiation Facility (ESRF, Grenoble, France) beamline ID30A-1, which was equipped with a PILATUS3 2M detector.Diffraction data for *JbCDTCM and *DsCDTCM were collected at 100 K at the BioMAX beamline, MAX-IV (Lund, Sweden), equipped with an EIGER X 16M detector.Data sets covering 360 were collected for each crystal, with oscillation ranges of 0.2 for *AfCMCDT, and 0.1 for *JbCDTCM and *DsCDTCM.The resolution at the detector edge was set to 1.0 Å for *AfCMCDT, to 1.5 Å for *JbCDTCM, and to 1.8 Å for *DsCDTCM, respectively.All three data sets were indexed and integrated using automated pipelines at the synchrotrons (autoPROC (78)).All data sets showed some anisotropy and were scaled and merged with the STARANISO server (78).
The structures were refined by alternating cycles of maximum-likelihood refinement using REFMAC (81) (a program from the CCP4 software suite) (80) with model inspection and rebuilding using Coot (82).Water molecules, alternative conformations, and ligands were added to the structures in later refinement cycles, interpreting peaks in the σ A -weighted mFo-DFc difference electron density map, taking the content of the mother liquor into account.Sodium ions were added when the difference density peaks were in close proximity (closer than a typical H-bond) to one or several carbonyl groups, or if an octahedral coordination was observed (one instance).Chloride ions were modeled when the placement of a water molecule led to artificially low B-factors and residual difference electron density at the exact position (and if the chemical environment was compatible with a negative charge).Some of the chloride ions were found in the CDT active site, at a position likely binding the carboxylate groups of the substrate (prephenate) or product (phenylpyruvate; Fig. 7C).In the CDT active site of *JbCDTCM, positive difference density revealed the presence of a MES molecule from the crystallization buffer; the sulfonate group occupied the position of chloride ions in *DsCDTCM.Acetate, also a component of the crystallization solution, was modeled into both CM and CDT active sites of *AfCMCDT.Finally, occupancy refinement was carried out with phenix.refine(83), a tool from the Phenix software package (84).Data collection, processing and refinement statistics (as calculated with Phenix (84)) are summarized in Tables 2 and S1.

Data availability
The atomic coordinates and structure factors of the crystal structures of *AfCMCDT (PDB ID: 8CQ3), *JbCDTCM (PDB ID: 8CQ4), and *DsCDTCM (PDB ID: 8CQ6) have been deposited in the Protein Data Bank.

Figure 2 .
Figure 2. Phylogenetic tree representation of the exported *CMCDT and *CDTCM fusion proteins.Shown is the phylogenetic relationship between the proteobacterial species that possess an exported *CMCDT or *CDTCM bifunctional fusion enzyme.Each node represents a taxonomic sub-classification.

Figure 3 .
Figure 3. Sequence alignment of bifunctional *CMCDT variants.The amino acid sequences of the *CMCDT bifunctional fusion enzymes are aligned to *MtCM (AroQγ subclass) and *PaCDT (also known as PheC).Signal sequences are underlined, and established *MtCM and *PaCDT active site residues (and their fully conserved counterparts in the new sequences) are highlighted in red and blue, respectively.Cys residues forming a disulfide bond in *MtCM and *PaCDT are boxed.Residue numbering shown on the right starts with the first residue (Met1) of the open reading frame including the signal sequence, except for *PaCDT, where the initial residue corresponds to the N-terminus of the mature protein after signal sequence cleavage (Gln1).This exception, distinguished with blue numbers, allows for compatibility with the nomenclature adopted by Clifton et al. in their work on *PaCDT (32) and thereby for consistency in the discussion of active site residues; it is also applied in the structural figures.The color gradients from white to dark red below the alignment indicate the level of sequence conservation at any given position.

Figure 4 .
Figure 4. Sequence alignment of bifunctional *CDTCM variants.Shown is the amino acid sequence alignment of the bifunctional *CDTCM fusion enzymes to *MtCM and *PaCDT.More details are described in the legend to Fig. 3.

a
Kinetic measurements were conducted at 30 C in PBS buffer at pH 7.5.Two independently prepared biological replicates were assayed and their individual k cat , K m , and k cat /K m determined after fitting to the Michaelis-Menten equation.Averages and standard deviations were calculated from the two individually determined values.Parameters derived from the Michaelis-Menten kinetics, for which the lowest substrate concentration technically measurable (2.5 μM) was less than 2.5-fold below the experimental K m , are shown in italics to indicate that they are less reliable.Plots illustrating curves fitted to pre-averaged data points at each substrate concentration are shown in Fig. S5.b Published kinetic parameters of the exported CM from M. tuberculosis (*MtCM) (19) and of the exported CDT from P. aeruginosa (*PaCDT) (32).

Figure 6 .
Figure 6.Superimpositions of *AfCMCDT, *JbCDTCM, and *DsCDTCM with *MtCM and *PaCDT.A-C, *MtCM (PDB ID: 2FP2, light gray) (9) and *PaCDT (PDB ID: 5HPQ, light green) (32) are shown superimposed onto *AfCMCDT (PDB ID: 8CQ3; this work) (A), *JbCDTCM (PDB ID: 8CQ4; this work) (B), and *DsCDTCM dimer (PDB ID: 8CQ6; this work) (C), with CM and CDT domains from two different chains shown in a large dashed rectangle.CM domains of the fusion proteins are depicted in red, magenta, and pink, respectively, and CDT domains in blue, cyan, and dark blue.Hinges between CDT subdomains are colored orange, and the expected exit trajectories from the active sites are indicated with black arrows.D-F, Surface representations of *AfCMCDT (PDB ID: 8CQ3; this work) (D), *JbCDTCM (PDB ID: 8CQ4; this work) (E), and *DsCDTCM (PDB ID: 8CQ6; this work) (F) with residues charged positively (Arg, Lys, His) and negatively (Glu, Asp) in blue and red, respectively.Note that the CM and CDT domains from the two different chains shown in the dashed rectangular box in (C) appear in front in panel (F) after rotation by 90 for better surface visualization.The active sites, which are all 35 to 40 Å apart, are indicated with small squares; those facing backward are drawn with dashed lines.The CM active sites are illustrated with the endo-oxabicyclic transition state analog (TSA) bound in the *MtCM structure (PDB ID: 2FP2) (9), and the CDT active sites with a 2-(N-morpholino)ethanesulfonic acid (MES) molecule that was co-crystallized with *JbCDTCM (yellow spheres; for details, see Fig. S6C).

Figure 7 .
Figure 7. Superimposition of active site residues in CM and CDT.A, Superimposition of CM active sites of *AfCMCDT (PDB ID: 8CQ3; this work) (red) and *JbCDTCM (PDB ID: 8CQ4; this work) (magenta) with the previously published monofunctional AroQ γ CM holotype *MtCM (PDB ID: 2FP2, gray) (9) in complex with an endo-oxabicyclic transition state analog (TSA, yellow; no ligands are shown for CM active sites of *AfCMCDT and *JbCDTCM).B, Superimposition of CDT active sites of *AfCMCDT (PDB ID: 8CQ3; this work) (blue) and *JbCDTCM (PDB ID: 8CQ4; this work) (cyan) with the monofunctional *PaCDT (PDB ID: 5HPQ, green) (32).*JbCDTCM contains an MES molecule (yellow) from the buffer in the CDT active site.C, Chemical structures of the ligands TSA, phenylpyruvate (the CDT substrate), and MES.Hydrogen-bonding interactions between ligand and protein are depicted as yellow dashed lines.Water molecules are displayed as red spheres.
(i) two-domain protein variants having knocked-out (KO) individual active sites and (ii) truncated variants with individual CM or CDT domains split up.Increased CDT activity in a CM-active-site KO variant (CMCDT or CDTCM) in comparison with the parental enzyme would suggest that the presence of an active CM domain interferes with CDT activity.Conversely, a CDTactive-site KO variant (CMCDT or CDTCM) may reveal potential effects of the CDT domain on CM domain function.Split variants consisting only of a single CM or CDT domain, thus uncoupling the covalently linked enzymatic activities, may provide complementary information on the biological purpose of the domain fusions.*AfCMCDT and *JbCDTCM served as representatives of the two bifunctional fusion enzyme topologies for the generation of the active-site KO and split-domain variants.

Figure 8 .
Figure 8. Protein-protein interactions between CM and CDT fusion domains.A, left panel: Superimposition of *MtCM (PDB ID: 2FP2) (9) (light gray, with helices H5 and H6 in dark gray) with the CM domains of *AfCMCDT and *JbCDTCM (respective PDB IDs: 8CQ3 and 8CQ4; this work; both in light gray, with the correspondingly labeled C-terminal helices of their CM domains in red and magenta, respectively).The TSA ligand (yellow spheres; from PDB ID: 2FP2) indicates the position of CM active site pockets with respect to the location of helices H5 and H6.Right panel: CM domains (gray) of *AfCMCDT and *JbCDTCM, superimposed to illustrate the topological difference of their corresponding CDT domains (blue and cyan, respectively), with the CM domains rotated 90 relative to the structures on the left.The circular lenses in (B) and (C) allow close-up views of the molecular interaction network between the CM (with relevant CM helices labeled) and CDT domains of (B) *AfCMCDT and (C) *JbCDTCM (respective PDB IDs: 8CQ3 and 8CQ4; this work).The linker sequences (green) connect the CM and CDT domains; residues engaged in prominent polar contacts between the two domains are shown as sticks.Water molecules involved in the interaction network are depicted as red spheres and hydrogen bond contacts with black dashed lines.For the electron density of the linker regions, see Fig. S10.

BFigure 9 .
Figure 9. Amino acid sequence alignments corresponding to the domain linker regions.Underlined are amino acid residues belonging to the signal peptides.Residues that are presumed to be part of the linkers are identified with the green box.The numbers on the right refer to the rightmost residue.For residue numbering of *PaCDT (in blue) see the legend to Fig. 3. Alignment of (A) *CMCDT and (B) *CDTCM variants.

5 a
3 12.4 ± 1.4 2.3 ± 0.0 × 10 5 7.7 ± 0.4 6.2 ± 1.7 1.3 ± 0.4 × 10 6 + Phe 2.9 ± 0.1 14.4 ± 1.3 2.0 ± 0.1 × 10 5 7.8 ± 1.2 9.8 ± 0.4 7.9 ± 0.9 × 10 5 + Tyr 3.0 ± 0.3 16.8 ± 7.5 1.9 ± 0.7 × 10 5 8.0 ± 1.5 8.4 ± 0.3 9.5 ± 2.1 × 10 Michaelis-Menten kinetic parameters were determined in PBS buffer, pH 7.5, at 30 C in the absence (control) or presence of 200 μM Phe or Tyr (80,000-fold molar excess relative to enzymes).All data were calculated from Michaelis-Menten parameters derived from two independently prepared biological replicates.Individual k cat , K m , and k cat /K m parameters were determined separately from data fitting to the Michaelis-Menten equation for each replicate and thereof the mean and standard deviations calculated.Preaveraged datapoints of each substrate concentration of the two replicates were used for curve fitting in the plots shown in Fig. S13.b The kinetic parameters for *AfCMCDT and *JbCDTCM controls without Phe and Tyr addition are taken from Table at 30 C for chorismate or prephenate.The Michaelis-Menten graphs were generated using Prism (GraphPad Software, San Diego, CA, USA) by plotting v init /[E] against the substrate concentration [S].Data points were fitted to the Michaelis-Menten equation v init = k cat × [E] × [S]/(K m + [S]) to determine the catalytic rate constant k cat and the Michaelis constant K m .All reported catalytic parameters are averages with standard deviations (σ n-1 ) calculated from individually determined values for k cat , K m , and k cat /K m of Michaelis-Menten kinetics performed separately with two independently prepared biological replicates.Feedback regulation by either Phe or Tyr was analyzed by performing the same protocol as for the discontinuous coupled CM+CDT assay but with an additional 200 μM Phe or 200 μM Tyr in the reaction buffer.Inhibition of *JbCDTCM by MES was tested by carrying out the CDT assay at 20 μM, 200 μM, 2 mM, and 20 mM concentrations of 2-(N-morpholino)ethanesulfonic acid at a fixed prephenate concentration of 50 μM.Protein crystallography Crystallization*AfCMCDT crystallized at 20 C in a sitting-drop 96-well SWISSCI UVXPO 2-lens crystallization plate.Diffractionquality crystals grew from a 1:3 volume ratio (150 nl:450 nl) mixture of 5 mg/ml protein, stored in 20 mM Tris-HCl, pH 8.0, 150 mM NaCl, and condition G7 from the PACT premier crystallization screening formulation (Molecular Dimensions Ltd, Rotherham, UK), containing 0.1 M Bis Tris propane, pH 7.5, 0.2 mM sodium acetate and 20% w/v PEG3350, at 20 C. The crystal was cryoprotected by quickly dipping it in a crystallization solution complemented with glycerol to a final concentration of 20% v/v glycerol, and was thereafter flashcooled in liquid nitrogen and stored until data collection.4mg/ml (85 μM) *JbCDTCM in 20 mM Tris-HCl, pH 8.0, was used for crystallization experiments.*JbCDTCM crystals were obtained at 20 C from a 1:1 (200 nl:200 nl) mixture of protein sample and solution C1 from the Morpheus crystallization screening kit (Molecular Dimensions Ltd), containing 30% w/v PEG500MME_P20K (10% w/v PEG 20000, 20% v/v PEG 500 monomethyl ether), 0.09 M NPS (0.03 M sodium nitrate, 0.03 M disodium hydrogen phosphate, 0.03 M ammonium sulfate), and 0.1 M MES/imidazole, pH 6.5.The crystallization experiment was set up in a sitting-drop 96-well UVXPO 2-lens crystallization plate (SWISSCI, High Wycombe, UK).Crystals were flash-cooled in liquid nitrogen without using any additional cryoprotectant and stored for data collection.

Table 1 In vitro kinetic parameters of the bifunctional fusion enzymes a Assay k cat
Shikimate pathway enzymes teamed up in no man's land *DsCDTCM were crystallized, and their structures solved and refined to 1.6, 1.7, and 2.4 Å resolution, respectively, and R free / R work values of 24.8/20.5%,24.3/19.9%,and 25.8/23.0%(Table 2).We thus have high-quality examples for both *CMCDT and *CDTCM topologies, represented by *AfCMCDT and *JbCDTCM, respectively, and a lower-quality structure of *DsCDTCM, with relatively high Wilson B-factors indicating disorder (Table

Table 2
Data collection and refinement statistics

Table 3
In vitro kinetic activities of active-site KO and split-domain variants of *AfCMCDT and *JbCDTCM a All kinetic measurements were performed in PBS buffer, pH 7.5, at 30 C. The reported k cat , K m , and k cat /K m parameters represent the mean and standard deviation of the parameters calculated for two independently prepared biological replicates, for which the kinetic data were individually fitted to the Michaelis-Menten equation.In Fig.S11, corresponding plots are shown, however, with the curve fit performed through the mean activity at each substrate concentration.Parameters derived from Michaelis-Menten kinetics, for which the lowest substrate concentration technically measurable (2.5 μM) was less than 2.5 fold below the experimental K m , are shown in italics to indicate that they are less reliable.bThekinetic parameters of wild-type *AfCMCDT and *JbCDTCM are taken from Table1for comparison.c The engineered active-site KO variants showed no measurable enzymatic activity for their mutated domain function.d The split-domain variants were only measured for their domain's inherent activity.

Table 4
Test for feedback regulation of *AfCMCT and *JbCDTCM activity by Phe and Tyr a